7 research outputs found

    N-gram analysis of 970 microbial organisms reveals presence of biological language models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been suggested previously that genome and proteome sequences show characteristics typical of natural-language texts such as "signature-style" word usage indicative of authors or topics, and that the algorithms originally developed for natural language processing may therefore be applied to genome sequences to draw biologically relevant conclusions. Following this approach of 'biological language modeling', statistical n-gram analysis has been applied for comparative analysis of whole proteome sequences of 44 organisms. It has been shown that a few particular amino acid n-grams are found in abundance in one organism but occurring very rarely in other organisms, thereby serving as genome signatures. At that time proteomes of only 44 organisms were available, thereby limiting the generalization of this hypothesis. Today nearly 1,000 genome sequences and corresponding translated sequences are available, making it feasible to test the existence of biological language models over the evolutionary tree.</p> <p>Results</p> <p>We studied whole proteome sequences of 970 microbial organisms using n-gram frequencies and cross-perplexity employing the Biological Language Modeling Toolkit and Patternix Revelio toolkit. Genus-specific signatures were observed even in a simple unigram distribution. By taking statistical n-gram model of one organism as reference and computing cross-perplexity of all other microbial proteomes with it, cross-perplexity was found to be predictive of branch distance of the phylogenetic tree. For example, a 4-gram model from proteome of <it>Shigellae flexneri 2a</it>, which belongs to the <it>Gammaproteobacteria </it>class showed a self-perplexity of 15.34 while the cross-perplexity of other organisms was in the range of 15.59 to 29.5 and was proportional to their branching distance in the evolutionary tree from <it>S. flexneri</it>. The organisms of this genus, which happen to be pathotypes of <it>E.coli</it>, also have the closest perplexity values with <it>E. coli.</it></p> <p>Conclusion</p> <p>Whole proteome sequences of microbial organisms have been shown to contain particular n-gram sequences in abundance in one organism but occurring very rarely in other organisms, thereby serving as proteome signatures. Further it has also been shown that perplexity, a statistical measure of similarity of n-gram composition, can be used to predict evolutionary distance within a genus in the phylogenetic tree.</p

    Tight junctions and the modulation of barrier function in disease

    Get PDF
    Tight junctions create a paracellular barrier in epithelial and endothelial cells protecting them from the external environment. Two different classes of integral membrane proteins constitute the tight junction strands in epithelial cells and endothelial cells, occludin and members of the claudin protein family. In addition, cytoplasmic scaffolding molecules associated with these junctions regulate diverse physiological processes like proliferation, cell polarity and regulated diffusion. In many diseases, disruption of this regulated barrier occurs. This review will briefly describe the molecular composition of the tight junctions and then present evidence of the link between tight junction dysfunction and disease

    Insights into the Immunological Properties of Intrinsically Disordered Malaria Proteins Using Proteome Scale Predictions

    Get PDF
    Malaria remains a significant global health burden. The development of an effective malaria vaccine remains as a major challenge with the potential to significantly reduce morbidity and mortality. While Plasmodium spp. have been shown to contain a large number of intrinsically disordered proteins (IDPs) or disordered protein regions, the relationship of protein structure to subcellular localisation and adaptive immune responses remains unclear. In this study, we employed several computational prediction algorithms to identify IDPs at the proteome level of six Plasmodium spp. and to investigate the potential impact of protein disorder on adaptive immunity against P. falciparum parasites. IDPs were shown to be particularly enriched within nuclear proteins, apical proteins, exported proteins and proteins localised to the parasitophorous vacuole. Furthermore, several leading vaccine candidates, and proteins with known roles in host-cell invasion, have extensive regions of disorder. Presentation of peptides by MHC molecules plays an important role in adaptive immune responses, and we show that IDP regions are predicted to contain relatively few MHC class I and II binding peptides owing to inherent differences in amino acid composition compared to structured domains. In contrast, linear B-cell epitopes were predicted to be enriched in IDPs. Tandem repeat regions and non-synonymous single nucleotide polymorphisms were found to be strongly associated with regions of disorder. In summary, immune responses against IDPs appear to have characteristics distinct from those against structured protein domains, with increased antibody recognition of linear epitopes but some constraints for MHC presentation and issues of polymorphisms. These findings have major implications for vaccine design, and understanding immunity to malaria

    A review of therapies for diabetic macular oedema and rationale for combination therapy

    No full text

    Monocyte chemoattractant protein-1 and the blood–brain barrier

    No full text
    corecore